Large-scale correlation mining for biomolecular network discovery

نویسندگان

  • Alfred Hero
  • Bala Rajaratnam
چکیده

Continuing advances in high-throughput mRNA probing, gene sequencing and microscopic imaging technology is producing a wealth of biomarker data on many different living organisms and conditions. Scientists hope that increasing amounts of relevant data will eventually lead to better understanding of the network of interactions between the thousands of molecules that regulate these organisms. Thus progress in understanding the biological science has become increasingly dependent on progress in understanding the data science. Data mining tools have been of particular relevance since they can sometimes be used to effectively separate the “wheat” from the “chaff”, winnowing the massive amount of data down to a few important data dimensions. Correlation mining is a data mining tool that is particularly useful for probing statistical correlations between biomarkers and recovering properties of their correlation networks. However, since the number of correlations between biomarkers is quadratically larger than the number biomarkers, the scalability of correlation mining in the big data setting becomes an issue. Furthermore, there are phase transitions that govern the correlation mining discoveries that must be understood in order for these discoveries to be reliable and of high confidence. This is especially important to understand at big data scales where the number of samples is fixed and the number of biomarkers becomes unbounded, a sampling regime referred to as the ”purely-high dimensional setting.” In this chapter, we will discuss some of the main advances and challenges in correlation mining in the context of large scale biomolecular networks with a focus on medicine. A new correlation mining application will be introduced: discovery of correlation sign flips between edges in a pair of correlation or partial correlation networks. The pair of networks could respectively correspond to a disease (or treatment) group and a control group. This paper is to appear as a chapter in the book Big Data over Networks from Cambridge University Press (ISBN: 9781107099005). 4 Large scale correlation mining for biomolecular network discovery

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Mining and Dynamic Simulation of Sub-Networks from Large Biomolecular Networks

Biomolecular networks dynamically respond to stimuli and implement cellular function. Understanding these dynamic changes is the key challenge for cell biologists. As biomolecular networks grow in size and complexity, the computer simulation is an essential tool to understand biomolecular network models. This paper presents a novel method to mine, model and evaluate the regulatory system (a typ...

متن کامل

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

Using Mapreduce to Scale Events Correlation Discovery for Business Processes Mining

 Using Mapreduce to scale events correlation discovery for business processes mining Hicham Reguieg, Farouk Toumani, Hamid Reza Motahari Nezhad, Boualem Benatallah HP Laboratories HPL-2012-170 business processes; Event Correlation; map reduce The volume of data related to business process execution is increasing significantly in the enterprise. Many of data sources include events related to th...

متن کامل

Text-Mining Needs And Solutions For The Biomolecular Interaction Network Database (BIND)

Proteomics represents a collection of experimental approaches that may be used to investigate biological systems. Such approaches commonly produce vast amounts of data that relate to physical interactions between biomolecules. One challenge in extracting useful information from these data is determining how they relate to current knowledge. The goal of the BIND database (Biomolecular Interactio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016